R for Spatial Analysis: An Overview

Tom Cunningham

1 Dec, 2023

What is R?

  • R is a programming language used by over 2 million people:

    • Focused on data analysis and statistics.
    • Used through the RStudio environment, which provides a helpful user interface.
    • Completely free and open source.
  • R and RStudio can be downloaded from: https://cran.r-project.org/


How do you use R?

  • R is used programmatically, through typing commands and functions rather than clicking.

  • This means the main file is a list of instructions called a script, rather than a project file in ArcGIS.

  • Within a script, you can read in data, make changes and create plots/maps. But the scripts themselves are the most important thing.

An RStudio window

Example of quick analysis in R

Example of a spatial analysis in R

Typical R workflow

  1. Load required packages.

  2. Import data (including shapefiles).

  3. Pre-process data (clean, create new variables etc.).

  4. Conduct analysis and modelling.

  5. Create outputs, like maps and plots.

So, let’s give it a go.

Making a map of economic inactivity in Manchester with R

Step 1: Load packages

library(dplyr)
library(sf) 
library(ggplot2)
  • Packages are useful add-ons to R that help with specific tasks. Here we load 3 packages:

    • dplyr: a package commonly used to aid data manipulation

    • sf: the main package for working with spatial data in R

    • ggplot2: a package to create nice looking plots (and maps)

Step 2: Import data

lsoa21_sf <- st_read("data/Lower_layer_Super_Output_Areas_2021_EW_BFC_V8_8154990398368723939/LSOA_2021_EW_BFC_V8.shp")


emp21 <- read.csv("data/2021lsoa_econ_act_manchester.csv",
                  skip = 9)
  • The next stage is to import the data you’ll be using. Data should be saved in the same directory as the one you are working in with R.

  • Here we import two files:

    • a shapefile of all 2021 UK LSOA boundaries (using the st_read function)

    • a CSV of economic activity 2021 Census data for LSOAs in Manchester LA (using the read.csv function)

Step 3: Join data and shapefile

emp21_sf <- inner_join(lsoa21_sf,                       
                       emp21, 
                       by = c("LSOA21CD" = "mnemonic"))
  • Like other GIS software, data needs to be joined to the shapefile using an ID column. We can do this in R using the inner_join function.

  • Here we create a new object called emp21_sf - this is just a name I’ve chosen. The arrow symbol (<-) gives an object a name.

Step 4: Create new variable for inactivity rate

emp21_rate <- mutate(emp21_sf, 
                     ec_inact_perc = Economically.inactive..excluding.full.time.students./Total * 100)
  • We now need to calculate the economically inactive variable as a proportion of the total LSOA population. New variables like this can be created using the mutate function.

  • Any changes you make to data in R do not affect the original data - so you don’t need to worry too much about making mistakes!

Step 5: Create a map

ggplot(emp21_rate, aes(fill = ec_inact_perc)) +
  geom_sf()
  • Now we can plot the variable on a map.

  • There are a few ways to do this, but we use ggplot, one of the most common plotting functions in R.

The final map

ggplot(emp21_rate, aes(fill = ec_inact_perc)) +
  geom_sf()

The RStudio window now

We can also make the map better…

… or even interactive!

So, why use R as a GIS?

Benefits of R over Arc/QGIS

  • Reproducible: typing out commands means that there is a record of your work.
    • Useful for you to look back on.
    • Useful for others to know what you’ve done.
    • Doesn’t affect raw data.
  • Scalable: R is better at dealing with large amounts of data.

Benefits of R over Arc/QGIS

  • More features: There are >20,000 packages currently available on CRAN.

  • It’s free! R is completely free and open source.

  • Can combine with other analysis: Easy to conduct non-spatial analysis on imported data.

Drawbacks of R over Arc/QGIS

  • Not as quick for simple mapping: For making quick and good-looking maps, R isn’t always the best.

  • Less immediate: There is a level of abstraction from the data you are using.

  • You need to know R: An initial learning curve that gets easier the more you use it.

If you want to know more…

These slides (made with Quarto in RStudio!) can be found on my Github: https://github.com/tmcunningham